Goto

Collaborating Authors

 weight consolidation


Distributed Weight Consolidation: A Brain Segmentation Case Study

Neural Information Processing Systems

Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns. However, it may be the case that derivative datasets or predictive models developed within individual sites can be shared and combined with fewer restrictions. Training on distributed data and combining the resulting networks is often viewed as continual learning, but these methods require networks to be trained sequentially. In this paper, we introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset. We evaluated DWC with a brain segmentation case study, where we consolidated dilated convolutional neural networks trained on independent structural magnetic resonance imaging (sMRI) datasets from different sites. We found that DWC led to increased performance on test sets from the different sites, while maintaining generalization performance for a very large and completely independent multi-site dataset, compared to an ensemble baseline.




Distributed Weight Consolidation: A Brain Segmentation Case Study

Neural Information Processing Systems

Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns. However, it may be the case that derivative datasets or predictive models developed within individual sites can be shared and combined with fewer restrictions. Training on distributed data and combining the resulting networks is often viewed as continual learning, but these methods require networks to be trained sequentially. In this paper, we introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset. We evaluated DWC with a brain segmentation case study, where we consolidated dilated convolutional neural networks trained on independent structural magnetic resonance imaging (sMRI) datasets from different sites. We found that DWC led to increased performance on test sets from the different sites, while maintaining generalization performance for a very large and completely independent multi-site dataset, compared to an ensemble baseline.


Reviews: Distributed Weight Consolidation: A Brain Segmentation Case Study

Neural Information Processing Systems

The paper proposes a technique for learning a model by consolidating weights across models that are trained in different datasets. The proposed approach thus attempts to solve an important problem that arises by the limitations of sharing and pooling data. The authors take on the brain segmentation problem by using MeshNet architectures. The proposed method essentially starts from the model learned from one dataset, performs variational continual learning to parallel train across multiple datasets, and then performs bayesian parallel learning to fine tune the model on a dataset by using as prior the weights learned in parallel from the rest of the datasets. The proposed approach has been tested using free surfer segmentations for data part of the Human Connectome Project, the Nathan Kline Institute, the Buckner Lab and the ABIDE project.


EVCL: Elastic Variational Continual Learning with Weight Consolidation

Batra, Hunar, Clark, Ronald

arXiv.org Machine Learning

Continual learning aims to allow models to learn new tasks without forgetting what has been learned before. This work introduces Elastic Variational Continual Learning with Weight Consolidation (EVCL), a novel hybrid model that integrates the variational posterior approximation mechanism of Variational Continual Learning (VCL) with the regularization-based parameter-protection strategy of Elastic Weight Consolidation (EWC). By combining the strengths of both methods, EVCL effectively mitigates catastrophic forgetting and enables better capture of dependencies between model parameters and task-specific data. Evaluated on five discriminative tasks, EVCL consistently outperforms existing baselines in both domain-incremental and task-incremental learning scenarios for deep discriminative models.


SL: Stable Learning in Source-Free Domain Adaption for Medical Image Segmentation

Chen, Yixin, Wang, Yan

arXiv.org Artificial Intelligence

Deep learning techniques for medical image analysis usually suffer from the domain shift between source and target data. Most existing works focus on unsupervised domain adaptation (UDA). However, in practical applications, privacy issues are much more severe. For example, the data of different hospitals have domain shifts due to equipment problems, and data of the two domains cannot be available simultaneously because of privacy. In this challenge defined as Source-Free UDA, the previous UDA medical methods are limited. Although a variety of medical source-free unsupervised domain adaption (MSFUDA) methods have been proposed, we found they fall into an over-fitting dilemma called "longer training, worse performance." Therefore, we propose the Stable Learning (SL) strategy to address the dilemma. SL is a scalable method and can be integrated with other research, which consists of Weight Consolidation and Entropy Increase. First, we apply Weight Consolidation to retain domain-invariant knowledge and then we design Entropy Increase to avoid over-learning. Comparative experiments prove the effectiveness of SL. We also have done extensive ablation experiments. Besides, We will release codes including a variety of MSFUDA methods.


Adapting machine translation models to new genres

#artificialintelligence

Neural machine translation systems are often optimized to perform well for specific text genres or domains, such as newspaper articles, user manuals, or customer support chats. In industrial settings with hundreds of language pairs to serve, however, a single translation system per language pair, which performs well across different text domains, is more efficient to deploy and maintain. Additionally, service providers may not know in advance which domains customers will be interested in. At this year's Conference on Empirical Methods in Natural Language Processing (EMNLP), we are presenting a new approach to multidomain adaptation for neural translation models, or adapting an existing model to new domains while maintaining translation quality in the original domain. Our approach provides a better trade-off between performance on old and new tasks than its predecessors do.


Synaptic Metaplasticity in Binarized Neural Networks

Laborieux, Axel, Ernoult, Maxence, Hirtzlin, Tifenn, Querlioz, Damien

arXiv.org Machine Learning

While deep neural networks have surpassed human performance in multiple situations, they are prone to catastrophic forgetting: upon training a new task, they rapidly forget previously learned ones. Neuroscience studies, based on idealized tasks, suggest that in the brain, synapses overcome this issue by adjusting their plasticity depending on their past history. However, such "metaplastic" behaviour has never been leveraged to mitigate catastrophic forgetting in deep neural networks. In this work, we highlight a connection between metaplasticity models and the training process of binarized neural networks, a low-precision version of deep neural networks. Building on this idea, we propose and demonstrate experimentally, in situations of multitask and stream learning, a training technique that prevents catastrophic forgetting without needing previously presented data, nor formal boundaries between datasets. We support our approach with a theoretical analysis on a tractable task. This work bridges computational neuroscience and deep learning, and presents significant assets for future embedded and neuromorphic systems.


Distributed Weight Consolidation: A Brain Segmentation Case Study

McClure, Patrick, Zheng, Charles Y., Kaczmarzyk, Jakub, Rogers-Lee, John, Ghosh, Satra, Nielson, Dylan, Bandettini, Peter A., Pereira, Francisco

Neural Information Processing Systems

Collecting the large datasets needed to train deep neural networks can be very difficult, particularly for the many applications for which sharing and pooling data is complicated by practical, ethical, or legal concerns. However, it may be the case that derivative datasets or predictive models developed within individual sites can be shared and combined with fewer restrictions. Training on distributed data and combining the resulting networks is often viewed as continual learning, but these methods require networks to be trained sequentially. In this paper, we introduce distributed weight consolidation (DWC), a continual learning method to consolidate the weights of separate neural networks, each trained on an independent dataset. We evaluated DWC with a brain segmentation case study, where we consolidated dilated convolutional neural networks trained on independent structural magnetic resonance imaging (sMRI) datasets from different sites.